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1 Introduction 



Hydroelectric scheduling is an important planning problem at The Pacific Gas and Electric Company. 
Depending on hydrological conditions, PG&E’s hydroelectric power plants generate roughly 10-20% 
of the system’s annual demand for electric energy. Other energy sources include gas-fired plants, 
the Diablo Canyon nuclectr plant, and imports from external sources; for simplicity, we collectively 
refer to these as ‘‘thermcd” energy sources. Thermal energy costing is complex, but for the purposes 
of the model we describe in this paper, we assume a nonlinear convex thermal cost function. Given a 
hydro generation schedule, this function provides the cost of operating the thermal system to satisfy 
the remaining demand for energy. An important source of the thermal cost function’s nonlinearity is 
the different efficiencies of various thermal plants. Hydro units are attractive because they generate 
energy at a very low variable cost and permit flexible scheduling since they can quickly ramp up to 
full power. However, due to reservoir and generation capacities and seasonal variations in natural 
inflow (via precipitation and snowmelt), they cannot be operated at full capacity year-round. The 
scheduling of the hydroelectric system is further complicated because the volume of future natural 
inflow into the system’s reservoirs is uncertain. The objective of the model we describe is to operate 
the hydro-thermal system with minimum expected cost for a two year planning horizon. Restated: 
we wish to operate the hydro system so as to maximize expected savings from avoided thermal 
generation costs. While we give an overview of the hydroelectric scheduling model and coordination 
(with the thermcd system) algorithm in §2, the reader is referred to Jacobs et al. [10] for a more 
detailed description of the ongoing project at PG<kE as well as for references to other approaches to 
hydroelectric scheduling. 

Solutions from hydroelectric scheduling models with deterministic natural inflow forecasts can be 
unsatisfactory. Hydro generation decisions in such models are made under the incorrect assumption 
that forecast levels of natural inflow are ensured in forthcoming months; if mean or median inflow 
values are used, the resulting solutions may fail to hedge against dry inflow scenarios. As a result, 
in a dry scenario, inefficient and costly thermal plants must be brought on line to satisfy demand 
for electricity. On the other hand, a conservative strategy derived from a relatively dry forecast 
may result in forced spills due to finite storage and generation capacities. These spills represent lost 
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potential energy and lost future cost savings. Stochastic progranruning formulations allow natural 
inflow to be modeled as random parameters with known distribution, but the size of the resulting 
mathematical programs can be formidable. Decomposition or L-shaped algorithms [3,13] provide an 
attractive approach to such problems because they take advantage of special structure. 

The aim of this paper is to present an enhanced decomposition algorithm for multistage stochas- 
tic programs and to examine its performance on a set of hydroelectric scheduling problems. The 
remainder of the paper is organized as follows. In §2 the hydroelectric scheduling module of the 
stochastic hydro-thermal optimization problem is described; a collection of test problems is also de- 
tailed. In §3 we briefly review Benders decomposition algorithm applied to multistage problems and 
discuss valid cut generation. The empirical performance of several enhancements to the traditional 
algorithm is presented in §4. In §5 we compare run-times of the enhanced decomposition algorithm 
and direct linear programming optimizers; the paper is summarized in §6. 



2 The Model 



A hydrological basin may be viewed as a network consisting of a number of reservoirs (nodes) that 
are interconnected by rivers, canals, and spillways (arcs); energy is generated as water flows through 
powerhouses. Given marginal values of energy, we model an individual basin hydroelectric scheduling 
problem as a T-stage stochastic linear program with recourse (SLP-T): 

T 

maximize 



t=i wicn, 

subject to ^ + AtX^' = 0 < x^* < 6 fit 

for 

where Bq = 0. 

The sample space for stage t is denoted fit, and a sample point (scenario) in fit is denoted cjf A 
stage i >2 scenario, cJt, has a unique stage i — 1 ancestor denoted a(u;t), and a stage i < T scenario 
has a set of descendant scenarios denoted A(cJt)* At is an rrit x nt matrix and the remaining matrices 
and vectors are dimensioned to conform. A stage t realization, ^ vector in 

3?^*, where Nt = 2nt + rut. We assume a finite number of scenarios and a probability mass function 
given by P {^t = = pT • notational convenience, w^e have created a first stage sample 
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space, Cti that is a singleton set where represents the known state at the time decisions are 

made in the first stage; clearly, has value one. SLP-T has structural constraints 

and decision variables where \Qt\ denotes the cardinality of Qf 

We may, nominally, regard At as the node-arc incidence matrix for the hydrological network. 
The actual form of At is more complex for several reasons. First, non-network side constraints 
must be incorporated; e.g., decrees constrain the volume of water in a subset of the reservoirs to 
a minimum level. Second, subperiod modeling is necessary to capture differences in peak and off- 
peak values of energy. Third, stages contain different numbers of time periods depending on the 
corresponding level of uncertainty in natural inflow. For example, summer months are relatively dry 
and multiple time periods are incorporated in a single stage; the snowmelt season in the spring, on 
the other hand, is a period of greater uncertainty and shorter time stages are required. In addition, 
longer time stages are typically used in the later stages of the model. 

In SLP-T, decisions occur and uncertainties unfold in the following manner. The first stage 
hydro generation and storage decisions are made with distributional information on future data; 
next, a specific scenario is revealed and second stage decisions are made knowing this data, the first 
stage decision, and conditional probability distributions on future inflows .. .The goal is to operate 
the hydro basin with maximum expected benefit, in terms of avoided thermal generation cost, for 
T time stages. The model has essentially three arc types: energy generation arcs, other spatial 
water transport arcs, and “transition-in-time” arcs. The matrix Bt contains arcs of the third type 
that are used to equate the amount of water left in a reservoir at the end of one stage with the 
amount of water in the same reservoir at the beginning of the next stage. Generation, spill, and 
reservoir capacities appear as simple bounds on the three arc types. Initial reservoir volumes are 
contained in b\ \ subsequent vectors contain the uncertain natural inflows. Energy generation arcs 
have objective function coefficients that represent marginal values of energy in terms of dollars per 
thousand acre foot; these values are stochastic and usually larger in drier natural inflow scenarios. 
Other arcs typic2dly have objective function coefficients of zero. In general, the stochastic parameters 
exhibit interstage dependence. For instance, relatively large inflows at lower elevations in the winter 
months are often coupled with a growing snowpack at higher elevations which will, in turn, lead 
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to large inflows when melting occurs in the spring. Methods of dealing with finite horizon effects 
include minimum final period reservoir storage requirements or a final period future value function; 
both methods may involve scenario dependent constraints. 

The individual basin programs described above are subproblems of a larger multistage stochastic 
nonlinear program. Hydro scheduling decisions must be made in a number of basins. The only 
constraints linking the different basins are the load constraints; these require that at each possible 
point in time, demand for energy be satisfied. The Dantzig- Wolfe decomposition principle may be 
applied to create a nonlinear master problem that contains proposed hydro solutions, the demand 
constraints, and the nonlinear thermal cost function. The linear subproblem separates into a sum of 
independent subproblems by hydro basin of the form of SLP-T. The Dantzig-Wolfe master generates 
scenario dependent marginal values of energy for the subproblems and the subproblems, in turn, 
paiss proposed hydro solutions to the master problem. We refer the reader to Eiselt, Pederzoli, and 
Sandbloom [5] for a discussion of nonlinear Dantzig-Wolfe decomposition. 



Name 


Size 


Dim 


Deterministic 

Equivalent 


Moke3.9 


169 X 820, 337 x 1713, 673 x 3298 


7 


7237 X 28473 


Ybsf3.9 


319 X 1559, 637 x 3119, 1273 x 6239 


12 


13687 X 53489 


Moke4.45 


57 X 271, 113 X 548, 337 x 1713, 673 x 3298 


7 


35736 X 140656 


Ybsf4.45 


107 X 519, 213 X 1039, 637 x 3119, 1273 x 6239 


12 


68012 X 265836 



Table 1: Test Problems 



In §4, we examine the performance of an enhanced decomposition algorithm on a collection of 
test problems that are preliminary versions of individual basin hydroelectric scheduling problems as 
described above. The test problems are models with different time horizons, stage definitions, and 
discretizations of natural inflow distributions. The models are based on two of the larger hydrological 
basins in the PG&E system: Mokelumne (Moke) and Yuba-Bear-South Feather (Ybsf). In Table 1, 
“Name” indicates the hydrological basin, the number of stages and number of scenarios; for example, 
Moke3.9 is a three stage model of the Mokelumne bcisin with IQ 3 I = 9. “Size” gives the row and 
column dimensions of the At matrices for each stage. The dimension of the domain of the recourse 
function is the number of large reservoirs in a basin; this value is denoted “Dim.” “Deterministic 
Equivalent” gives the row and column dimensions of the SLP-T formulation. 
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3 Benders Decomposition and Valid Cut Generation 



We may write SLP-2, in minimization form, as follows: 

minimize { ciXi + , w) } , 

^jXl = i] 
xj > 0 



( 1 ) 



where 



C2X2 



h{x\,uj)=^ minimize 

subject to A 2 X 2 =62+51X1 

Xn 



( 2 ) 



1^2 ^ 0 - 

Note the simple upper bounds are not explicit; when this is the case, it may be assumed that they 
have been included in the structural constraints. 



Benders decomposition for SLP-2 (see Van Slyke and Wets [13]) is a resource directed decompo- 
sition. A first stage decision is passed to the right-hand-sides of the second stage recourse problems 
(2) which then ax:t optimally under each scenario u;. Supports of the piecewise linear convex recourse 
function, Eufh{x\,uj), called cuts, are derived from the dual of (2) and are subsequently passed back 
to the first stage master problem and the process repeats. Under the assumption that second stage 
“infeasibilities” are modeled by a penalty function, a forward pass of the algorithm generates a feasi- 
ble decision and hence an upper bound on the optimal objective function value. We can also obtain 
a lower bound on the optimal objective function value via the master program’s objective function 
value: the cuts collected so far provide an outer linearization of the recourse function. The extension 
of this procedure to SLP-T can be viewed as follows. SLP-T is first decomposed into two stages: 
stage 1 and stage 2, After the first stage problem passes resources to the right-hand-sides 

of the stage 2 subproblems, there are IQ2I linear programs to solve. Each of these linear programs 
is solved via decomposition into two stages: stage 2 and stage 3,...,T, and so on. This nested 
method is the “traditional” nested Benders decomposition approach to multistage stochastic linear 
programming (see Birge [3]); a more formal description is provided in §4.4. 

We now provide conditions under which valid cuts can be generated; these results prove useful 
in developing some of the enhancements to the traditional algorithm described in §4. A valid cut is 
defined to be a cut that lies below the recourse function. Lemmas 1 and 2, state that dual feasible 
vectors generate valid cuts for SLP-2 and SLP-T, respectively. 
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Lemma 1 Consider SLP-2 with |Q| = K. are dual feasible for the second stage 

subproblems then these dual vectors generate a valid cut for the first stage master program. 

Proof 

Let = {tt : t[A 2 < ^ 2 } denote the dual feasible region of (2). By hypothesis, G V w G 
thus 

^{b^’^BiXi) < h{xuu) = ir^ib^’^BiXi) Vx,. 

G n**' 

By taking expectations we see Gx\ + ^ is a valid cut where G = E^^7^**^Bl and g = Eu,^^b 2 . M 

With the exception of the fined stage, the difference in the multistage setting is that subproblems 
generating cuts for their ancestors contain their own cuts. Lemma 1 implies dual feasible vectors to 
the stage T subproblems generate valid cuts for their stage T— \ ancestors, but a new result is needed 
for the general case. Lemma 2 ensures that if the stage f (2 < / < T) subproblems contain valid 
cuts then they will, given dual feasible vectors, generate valid cuts for their stage t — 1 ancestors. 
The stage / (1 < f < T) subproblem under scenario u;*, denoted sub(u;t), has the following form: 

minimize c^^x^f* -f O'f* 

subject to AtX^f* = -1- 

sub(uj,) -G't'xT + e ></p 

xT > 0 . 

The rows of the matrix contain cut gradients; the elements of the vector g^* are cut intercepts; 
and e denotes the vector of all Ts. As the algorithm proceeds, the row dimension of these quantities 
will grow. 

Lemma 2 Suppose |A(tJt)| = A', 1 < f < T — 2, and the descendants of sub{ujt) contain valid cuts. 
// )] are dual feasible vectors for the descendants of sub(ujt) then these 
dual vectors generate a valid cut for sub{ut). 

Proof 

Denote the value of subproblem (3) by Et-iixt-iiUJtyG'f^g^*). Let ^ yaVid 

cuts and define ft{xt^ujt+i) = ). For each G A(cjf) there exists a 

finite set of cuts, denoted by conditional stage t -f 2 recourse function 
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is completely specified; let ht{xt,uji^i) = Now suppose (^^+ 1 * > ^7+t* ) 

dual feasible for the program corresponding to Then 

KiV (CV + ^t^t) + &71VCV < < ht(xt,u;^^^) V x,. (4) 

The left hand inequality is immediate by writing the program corresponding to its 

dual form. The right hand inequality follows as the cut constraints in ft{xt^ujt^i) are dominated by 
the cuts of ht{xtiUft+\). The desired result is obtained by multiplying (4) by summing 

over all Wj+i E A(a;t). ■ 

4 The Enhanced Decomposition Algorithm 

In this section, four enhancements to the traditional nested Benders algorithm are presented: Warm 
start techniques obtain ‘‘good” initial basic feasible solutions for subproblems based on optimal basis 
information from previous subproblem solutions. Advanced start procedures generate preliminary 
cuts prior to initiating a formal Benders algorithm. The multicut Benders decomposition algorithm 
returns one cut for each descendant of a particular subproblem instead of a single aggregate cut. 
Tree traversing strategies prescribe the order in which subproblems of the decision tree are solved. 

The enhanced algorithm is implemented in FORTRAN 77, and the computational results we 
present are from a Hewlett Packard 9000/750 workstation. The algorithm uses NETSIDE, a special 
purpose code that solves the minimum cost flow network problem with side constraints, developed 
by Kennington and Farhangian as the subproblem solver. NETSIDE is based on a specialization of 
the primal simplex method (see Kennington and Helgason [11] and Barr et al. [2]); in our setting, 
the set of side constraints includes Benders cuts. 

The performance of a particular algorithmic enhancement will be analyzed with respect to a base 
case strategy which is the strategy we recommend. Thus, we evaluate the marginal effect of each 
enhancement. The base case strategy and its performance on the four test problems is summarized 
in Tables 2 and 3; the details are presented in the respective subsections that follow. All problems 
are solved to within an objective function tolerance of 0.01%. Reported CPU times exclude input 
and output operations. The # subproblems column of Table 3 gives the number of subproblems 
solved in each stage during the course of the algorithm. 
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Warm Start 


Candidate list of length 20 until relative error < 5% 


Advanced Start 


Prespecified Decisions Method with shared cuts 


Multicut 


Yes 


Tree Traversing Strategy 


Fastpass 



Table 2: Base Case Strategy 



Name 


subproblems 


# iterations 


CPU (sec.) 


Moke3.9 


10-59-99 


10 


55.4 


Ybsf3.9 


7-47-72 


7 


119.8 


Moke4.45 


10-64-274-459 


10 


221.4 


Ybsf4.45 


8-83-205-369 


8 


404.0 


Average 


N/A 


8.8 


200.2 



Table 3: Base Case Strategy Performance 



4.1 Warm Start 

Similar subproblems are repeatedly solved during the course of a decomposition procedure. Tech- 
niques that take advantage of optimal basis information from previous subproblem solutions are 
critical for developing efficient algorithms. Wets [14] and Garstka and Rutenberg [6] have proposed 
bunching and sifting techniques, respectively for solving a large number of similar linear programs. 
The sifting method, however, requires component-wise independence of the right-hand-side and de- 
terministic objective function coefficients; our test problems violate these requirements. Moreover, 
in our experiments there were a low number of “repeat” optimal bases which seemed to indicate 
bunching might not occur. While the primary computational challenge in the two-stage problem 
lies in the solution of a large number of similar second stage problems each lieraiion, the greatest 
potential for computational savings in a multistage problem, with only a few stochastic branches 
each stage, rests in an ability to select good initial bases for each subproblem. 

We propose a warm start technique in which initial bases are selected from a candidate list until 
the relative error is sufficiently small. In subsequent iterations, subproblems are initialized with 
the basis from their previous optimal solution. The columns of a basic solution of subproblem (3) 
can be partitioned into a network component and a side constraint component; see Kennington and 
Helgason [11]. The heuristic used to select the best basis from the candidate list for a particular 
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subproblem proceeds as follows: 

(i) Calculate network flows from the network component of each candidate list basis. 

(ii) Calculate solutions for the entire subproblem based on each network flow solution. 

(iii) Determine the objective function value of eeich solution. 

(iv) Select the candidate list basis that has the minimum corresponding objective function value. 

Step (i) can be performed quickly due to the tree structure of the network basis. In step (ii) 
we substitute the solution from (i) into the side constraints and generate a feasible solution from 
slack, surplus, artificial, and future cost variables. The “best” basis is then determined from the 
objective function value of each solution. Note that the value calculated in (iii) is only an estimate 
of the actual objective function value that the basis will yield since the side constraint component of 
the basis is not considered. We ignore this component due to the computational effort required for 
refactorization and the fact that the network constitutes most of the subproblem. Minimal storage 
is required for each basis: arc indices within a pointer structure, a list of upper bound arc indices 
for the network, and a list of column indices for the side constraints. See Jacobs [9] for the details of 
the NETSIDE basis insertion procedure. Warm start parameters that the user must select are the 
maximum size of the candidate lists and the relative error at which the method switches from the 
candidate list heuristic to simply reusing the previous optimal basis for each subproblem; reasonable 
values for these parameters are suggested below in the computational results discussion. 

Computational Results 

Columns 2-4 of Table 4 detail the performance of the algorithm when each subproblem solution 
begins with an all-artificial basis. Similarly, columns 5-7 show the empirical results of the strategy 
in which the candidate list heuristic is not used and we simply recall the previous optimal basis for 
each subproblem; if such a basis does not exist (because a particular subproblem has not yet been 
solved) then the optimal basis of another subproblem from the same stage is used. The “x increase” 
column is defined as T/Tic and the “% increase” column as (T — The) /The • 100. The denotes the 
running time of the base case strategy and T the modified strategy; e.g., the base case with no warm 
start. If the % increase column is 20 then it is to be read: “the current strategy’s running time is 
20% longer than the running time of the base case strategy.” Table 4 reveals the dramatic impact 
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of warm starts and also indicates that substantial computational savings can result from using the 
heuristic to select good bases early in the algorithm. The values of 5% for the switch over tolerance 
and 20 for the candidate list length (see Table 2) were determined by varying these parameters on 
a wider variety of test problems using the base case strategy with and without an advanced start. 



Name 


No Warm Start 


Recall Previous Basis 


iter. 


CPU (sec.) 


X increase 


iter. 


CPU (sec.) 


% increase 


Moke3.9 


13 


909.1 


16.4 


12 


75.0 


35.4 


Ybsf3.9 


8 


1973.3 


16.5 


8 


163.4 


36.4 


Moke4.45 


11 


3793.4 


17.1 


11 


263.4 


19.0 


Ybsf4.45 


11 


14267.8 


35.3 


9 


526.1 


30.2 


Average 


10.75 


5253.9 


21.3 


10 


257.0 


30.3 



Table 4: Warm Start Procedures 



4.2 Advanced Start 

A Benders decomposition algorithm initiated with no preliminary cuts generates myopic first it- 
eration decisions and “extreme” decisions in the early iterations. The idea behind an advanced 
start method is to calculate preliminary cuts to help guide the early iterations of the algorithm. A 
technique utilized by Infanger [8] involves first solving, by Benders decomposition, the much smaller 
expected value problem in which the stochastic parameters of SLP-T are replaced by their population 
means. The cuts produced in the process are valid for SLP-T if the stochastic parameters exhibit 
interstage independence and appear only in Bt and this claim is easily verified via Lemmas 1 
and 2. However, in the stochastic hydroelectric scheduling problems the objective function coeffi- 
cients are random and the stochastic parameters are temporally correlated; thus the expected value 
method is not applicable and we seek an alternate approach. 

A “prespecified decisions” method for computing preliminary cuts requires, for each stage / < T, 
a collection of decision vectors: : i = 1, ..., The basic idea is to generate preliminary cuts 

by solving subproblems with right-hand-sides of the form: . A naive implementation 

involves solving the descendants of each scenario tree node at each of the prespecified decisions and 
computing the corresponding cuts. In the streamlined approach described in this section, we select 
a single scenario Ut on each stage and solve the descendants of scenario and then pass a cut 
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back not only to sub(a?j_i) but also to all its stage ^ — 1 neighbors via the dual sharing formula 

described below; Figure 1 summarizes the method. In a symmetric four stage, 45 scenario problem 

with 1, 3, 15, and 45 scenarios on each stage and Nt = 3, the number of subproblems solved in the 

streamlined method is and 33 while the naive method is 189. 

define 1 

do < = T downto 2 
do f = 1 to Nt^i 
do iJi 6 

form RHS of sub(a;|): 
solve sub(u;t) 
enddo 

pass cut to sub(ci;i_i) V 
enddo 
enddo 

Figure 1: Prespecified Decisions Method - Sharing Duals 

The only relevant components of the prespecified decision vectors are ones that contribute to 
the product this corresponds to end-of-stage reservoir storage volumes in the hydro 

scheduling problems. Reservoirs have natural lower and upper storage bounds, and in the absence 
of additional information (e.g., historical storage levels or values from prior optimizations), the 
prespecified decisions are defined as percentiles between the upper and lower bounds. In particular, 
we use three prespecified vectors at 20%, 50%, and 80%, and we define Ut = where = 

A'j} and where the ceiling function f] gives the smallest integer greater than or equal to 
its argument. The order of cut computation is relevant; for example, in a three stage problem all 
preliminary cuts are passed to the second stage prior to passing any cuts to the first stage. In this 
way, the maximum possible amount of information is subsequently passed to the first stage. 

The Dual Sharing Formula 

The dual of sub(u;i) (see program (3)) with explicit simple bounds may be written: 

maximize 7Tp ^6^^* -f -f ct*f*g^* — 

subject to ~ 

= 1 

> 0 . > 0 - 

In describing the dual sharing formula, super and subscripts are suppressed for clarity. Suppose 
we have solved a stage t subproblem with data (c,G, ^4) and obtained optimal dual prices (7r,d,/i). 
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Given another stage t subproblem with data (c,G,A) feasible dual prices can be generated directly 
from via 

= (’T,d,[iyl-dG-c]'''j . (5) 

Dual feasibility of (7 t,q,^) is easily verified by substitution. The [r]"*" notation means take the 
positive part of the vector v, component-wise. We refer to (5) as the dual sharing formula. Note (5) 
is also valid for stage T subproblems when the (vacuous) cut gradient matrix and associated dual 
variables are dropped. 

Suppose we have solved the descendants of sub(LDf); using the corresponding optimal dual vari- 
ables, the dual sharing formula may be applied to compute a valid cut for sub(o;[). To compute this 
cut, we match elements of A(u;t) and A(tj|). (Another possibility is to select the dual vectors that 
produce the strongest cut at a particular stage i decision.) By Lemmas 1 and 2 these feasible dual 
vectors generate valid cuts. Note that a cut generated by applying (5) can be weak; the extreme case 
is a positive price on an infinite simple bound. In the hydroelectric scheduling problems, however, 
the only arcs that can have nonzero shared /i’s have natural finite bounds. 

Computational Results 

Columns 2-4 of Table 5 detail the performance of the base case strategy with no advanced start and 
columns 5-7 show the performance when we use the naive advanced start without the dual sharing 
formula. In selecting an advanced start procedure, one must balance the computational benefit 
that the preliminary cuts yield with the cost of generating the cuts. Table 5 shows the base case 
strategy of solving only a subset of subproblems on each stage and utilizing the dual sharing formula 
provides an attractive advanced start. The average relative error after the first iteration for the 
base case strategy and advanced start strategy with no sharing is 6.8% and 6.1%, respectively, and 
the corresponding average CPU times for the first iteration are 67.6 sec. and 117.8 sec. Thus the 
slower method produces slightly stronger cuts, but the empirical results indicate the computational 
expense is too high. The table reveals, however, the naive advanced start procedure is preferable 
to none at all, and that the advanced start procedures provide greater computational savings in the 
four stage problems than in the three stage problems. 
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Name 


No Advancec 


Start 


Naive Advanced Start 


iter. 


CPU (sec.) 


% increase 


iter. 


CPU (sec.) 


% increase 


Moke3.9 


15 


68.4 


23.5 


10 


60.0 


8.3 


Ybsf3.9 


9 


118.5 


-1.1 


7 


129.1 


7.8 


Moke4.45 


13 


318.9 


44.0 


9 


255.3 


15.3 


Ybsf4.45 


13 


614.7 


52.2 


9 


553.1 


36.9 


Average 


12.5 


280.1 


29.7 


8.8 


249.4 


17.1 



Table 5: Advanced Start Procedures 



4.3 Multicut Algorithms 

Birge and Louveaux [4] introduced the multicut L-shaped algorithm for SLP-2 (1). The traditional 
aggregate cut algorithm creates a master program by replacing program (1) by 

6 and sequentially adding cuts of the form 6 > iteration. 

The multicut version instead replaces h{xi,u) by 0^ for all a; G ^ and each iteration appends |Q| 
cuts of the form 0^ > (n'^Bi)xi + When compared with the aggregate cut algorithm, the 

multicut algorithm has the disadvantage of requiring more decision variables; similarly, after a given 
number of iterations, it also maintains a larger number of cut constraints in the master program. 
This, however, is countered by the advantage of increased resolution of the recourse function. In 
practice we expect multicut algorithms will typically take fewer iterations than their aggregate cut 
counterparts. (Birge and Louveaux [4], however, present a counter example showing this is not always 
the case.) The multicut algorithm extends to the multistage setting in a straightforward fashion. 
Each descendant scenario passes back a cut to its ancestor and the ancestor objective function has 
the form: 

ct'xf+ 

Other generalizations of the multicut algorithm are possible; the descendants can be partitioned 
into disjoint sets and a “0” defined for each set. In the multicut algorithm described above, each set 
of the partition is a singleton, and the aggregate cut algorithm is the special case where the only 
set of the partition is A{u?t) itself. A coarse partition version of the multicut algorithm should be 
particularly attractive when the number of scenarios is large. The other enhancements discussed in 
this paper can run in either aggregate cut or multicut mode. 
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Computational Results 



Name 


# iterations 


CPU (sec.) 


% increase 


Moke3.9 


22 


107.3 


93.7 


Ybsf3.9 


13 


146.3 


22.1 


Moke4.45 


18 


318.1 


43.7 


Ybsf4.45 


17 


662.0 


63.9 


Average 


17.5 


308.4 


55.9 



Table 6: Single Cut 



Table 6 details the performance of the strategy in which the multicut method is replaced by the 
single cut procedure in the base case strategy. The average number of iterations in the multicut 
procedure is about half that of the single cut method. However, due to the quality of the warm 
start procedure the corresponding running times are not halved. Nevertheless, Table 6 shows the 
multicut method yields a significant computational advantage. The relative error as a function of 
CPU time is plotted in Figure 2 for three strategies on test problem Moke4.45. Note (i) the faster 
convergence of the multicut algorithm, (ii) the additional computational effort but improved initial 
relative error of the advanced start procedure, and (iii) the effect of the warm start on the time per 
iteration as the algorithm proceeds. 
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Figure 2: Relative error vs. CPU time - Moke4.45 
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4A Tree Traversing Strategies 



For brevity, we refer to the nested Benders decomposition algorithm described in §3 as the shuffle tree 
traversing strategy. Abrahamson [1] and Wittrock [15] developed strategies other than shuffle for 
deterministic multistage linear programs; we consider two of these: fastpass and cautious. Gassmann 
[7] tested shuffle^ fastpass, and cautious in the stochastic setting. Abrahamson, Wittrock, and 
Gassmann found fastpass to be an effective strategy. In addition to these three strategies, we 
present two new classes of tree traversing strategies. 

The two extreme strategies are shuffle and cautious. Shuffle only goes backward up the tree if it 
cannot go forward; i.e., it solves all the stage / 1, . . . ,T subproblems explicitly (within a tolerance) 

prior to passing cuts back to stage t. On the other hand, cautious never goes forward down the tree 
unless all cuts that would be passed back to stage t — 1 are redundant. Fastpass is a strategy “half- 
way” between shuffle and cautious. We introduce the e-shuffle and e-cautious strategies: e-shuffle is 
a strategy that is in between shuffle and fastpass; it is less hesitant to go backward up the tree than 
the former but more hesitant than the latter; e-cautious is similarly in between cautious and fastpass. 
As e increases both c-strategies become more like fastpass. As e shrinks, the two e-strategies more 
closely mimic their (non e) counterparts. 

The primary concern with shuffle is it may spend too long solving, for example, the stage 2, . . . , T 
subproblems with respect to a poor first stage decision. The quality of the information passed up 
the tree is high (the cuts are supports of the recourse function), but too much effort may be spent 
computing the cuts. Cautious, on the other hand, may spend too much effort generating stage t — 1 
cuts when the stage t cuts do not yet give a good approximation of the expected costs to be incurred 
in stages 1, . . , , T. The “best” tree traversing strategy will properly balance the quality of the cuts 
(and hence the lower bound) with the computational effort required to generate them. The search 
for this balance motivates considering strategies that range between shuffle and cautious. Fastpass 
is one such strategy; the e-strategies enable us to more fully investigate intermediate strategies. 

Tree Traversing Theorem 

In shuffle, a subproblem only receives a cut from its descendants after each descendant subproblem is 
solved with respect to its descendants. In this context, solved means the upper and lower bounds on 
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the optimal objective function value coincide. The tree traversing theorem states that a cut passed 
back to a subproblem “prematurely” (i.e., while the gaps in descendant objective function bounds 
are still positive) is a valid cut. 

Theorem 3 A cui passed back to sub(iJt) when the subproblems, sub(ujf^\)f G A(u;t), are not 
solved with respect to their descendants (due to insufficient cuts) is a valid cut. 



Proof 

This result follows directly from Lemma 1 and Lemma 2: The subproblems contain valid cuts, and 
the dual variables associated with optimal solutions to the subproblems are dual feasible. ■ 



In light of Theorem 3, one has a great deal of freedom in designing algorithms with respect to 
the order in which the subproblems of the decision tree are solved. Our goal is to devise strategies 
that allow us to solve large-scale multistage stochastic linear programs as quickly as possible. 



Formal Strategy Definition 

Absolute error and discrepancy are two useful concepts in formally defining the tree traversing 
strategies. The absolute error, for the stage t subproblem under scenario Wt, i.e., sub(u;t), 

is 






E E 

» = «+! 



Pi 






This expression is the difference between upper and lower bounds on the optimal objective function 
value for sub(u?^); note that it depends on sub(u;t)’s right-hand^side and hence its ancestor’s de- 
cisions. The absolute error for the stage T subproblems is zero because these linear programs are 
solved directly. The A notation (see §2) may also be applied to a set: A(5) = U 3€5 means 

apply A n times, e.g., A^(u>t) = A(A(u;t)). 



Scott [12] defined discrepancy, Vt, in the deterministic case. We extend the notion of discrepancy 
to the stochastic setting. 



V,{x^ 




pr;r'“' (cr;r<;r+cri[-c 
) J 



is the discrepancy for sub(w(). The second term in the discrepancy, 6 ^ , represents sub(w()’s estimate 
of the expected cost to be incurred by its descendants in stages <+ 1, . . . , T. The first term represents 
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the conditional expected value of the cost realized in stage t -{- 1 plus the stage t -|- 1 estimate of the 
expected cost in stages / -h 2, . . . ,T. The discrepancy for any stage T subproblem is zero. 

We now define two subroutines: forward{u,v) and backward{v,u). The former sweeps forward 
from stage u to stage v. It first forms the right-hand-sides of the appropriate stage u subproblems 
and solves them, and then it forms the right-hand-sides of the appropriate stage u-f 1 subproblems 
and solves them . . . until it has solved all the appropriate stage v subproblems. On the other hand, 
backward{vyu) first passes cuts to the appropriate stage v subproblems and solves them, and then 
passes cuts back to the appropriate stage v — 1 subproblems and solves them . . . until it has passed 
cuts back to the appropriate stage u subproblems (it does not solve them). In the execution of 
forward{u^v) and backward{vyu), it is not always necessary to solve every subproblem that we might 
address. For example, in subroutine forward{u, t>), some of the stage u subproblems (and therefore its 
descendants) might have a sufficiently small absolute error, and in subroutine backward{vyu)y some 
of the stage v subproblems (and possibly its ancestors) might have a sufficiently small discrepancy. 
For declaring specific subproblems temporarily “solved” in this fashion, we use discrepancies in the 
cautious strategies and absolute errors in the shuffle and fastpass strategies. 

Because the Benders cuts form an outer linearization of the recourse function, Vt{x^\uJt) > 0. 
Furthermore, = 0 implies the cut that would be passed to sub(u;t) is redundant. Two 

more useful facts relating absolute error and discrepancy are: 

E^,V,(xr\u;,) = E^,Mxr ( 6 ) 

r-1 

<=1 

When >ti(xi,u;i) < toler • min(|t/|, |L|), SLP-T is declared to be solved where ioler is a prespecified 
tolerance. U and L denote the upper and lower bounds on the optimal objective function value 
that the decomposition algorithm continually updates. The definitions of sufficiently small expected 
absolute error and sufficiently small expected discrepancy used in the tree traversing strategies are 
motivated by (6) and (7). The cautious and c-cawfmws strategies begin with one iteration of fastpass 
so initial upper and lower bounds on the optimal objective function value may be defined. 
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(1) fasipass 

step 0 
step 1 
step 2 
step 3 
step 4 


define ioler\ set iter = 1 
forward(\^T) 

if A\ (xi , u/i ) < ioler • min(|t/|, \L\) then stop; optimal solution at hand 
backward(T —1,1); iter = iter -f 1 
go to step 1 


(2) shuffle 

step 0 
step 1 
step 2 
step 3 
step 4 
step 5 
step 6 


define toler; set iter = 1, t = 1 
forward{ty T) 

if Ai (ri , u;i) < toler • min(|f/|, \L\) then stop; optimal solution at hand 
t = max {i ; u/,) > ^3^ • fo/er • min(|f/|, |L|)} 

backward(ty f) 

if t = 1 then iter = iter -f- 1 
go to step 1 


( 3 ) cautious 

step 0 
step 1 
step 2 
step 3 


define to/cr; set iter = 1, = 1, <2 = 2 

apply one iteration of the fastpass algorithm; iter = iter 1 

foTward{ti , <2) 

i( t2 —T then 

iter = iter -f 1; 

Ai{xiyUi) < toler ‘ min(\U\,\L\) then stop: optimal solution at hand 


step 4 


if < f^.mm(\U\,\L\) then 

h =<2 + l;<2 = <i 

else 

hacku)ard{t2 — 1, 1); 

= 1; <2 = 2 


step 5 


go to step 2 


( 4 ) e-shuffle 

step 0 
step 1 
step 2 
step 3 
step 4 
step 5 
step 6 


define toler, e; set ifer = 1, t = 1 
forward(t, T) 

if A\ (xi ,u;i) < toler • min{\U\, \L\) then stop: optimal solution at hand 
t = max {max {i : , u;,) > • e • mi 7 i(|f/|, |L|) } , 1 } 

backward{T — 1,<) 
if < = 1 then iter = itcr + 1 
go to step 1 



( 5 ) {-cautious 



step 0 
step 1 
step 2 
step 3 


define toler, e; set ifer = 1, <1 = 1, <2 = 2 

apply one iteration of the fastpass algorithm; iter = iter + 1 

forward(t \ , <2) 

if <2 = ^ then 

tier = iter 1; 

if (xi , u;i ) < io/er • min(|t/|, |£|) then stop: optimal solution at hand 


step 4 


if <r-l then 

<1 = <2 + 1; <2 = t, 

else 

backward(t2 — 1,1); 
ti = 1; (2 = 2 


step 5 


go to step 2 
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Computational Results 

Table 7 restates the base case (fasipass) strategy’s results for convenient reference. Tables 8 and 9 
use “x increase” with respect to fasipass as the performance measure; if this ratio is less than 1.0, 
the strategy outperforms fasipass on that particular problem. By examining the # Subs columns in 
Tables 7 and 8 one can contrast the distribution of computing effort per stage for three strategies. 



Name 


fasipass 


CPU (sec.) 


iter. 


# Subs 


Moke3.9 


55.4 


10 


10-59-99 


Ybsf3.9 


119.8 


7 


7-47-72 


Moke4.45 


221.4 


10 


10-64-274-459 


Ybsf4.45 


404.0 


8 


8-83-205-369 



Table 7: Base Case Strategy - fasipass 



Name 


cauiious 


shuffle 


X increase 


iter. 


# Subs 


X increase 


iter. 


# Subs 


Moke3.9 


0.92 


7 


19-82-72 


1.08 


7 


7-51-135 


Ybsf3.9 


1.03 


6 


21-86-63 


1.21 


6 


6-45-117 


Moke4.45 


1.06 


8 


27-140-449-369 


2.10 


4 


4-40-296-852 


Ybsf4.45 


1.00 


6 


29-202-274-279 


2.18 


8 


8-71-263-771 



Table 8: cauiious amd shuffle 



In Table 9 the “x increase” column is the average of the CPU ratios for the four test problems. This 
value is displayed for some specific values of c in the c-cauiious and c-shuffle strategies. 



c-cauitous 


c-shuffle 


c 


X increase 


c 


X increase 


0.0001 


1.00 


0.0001 


1.64 


0.0004 


0.98 


0.0004 


1.52 


0.0016 


0.97 


0.0016 


1.32 


0.0064 


0.95 


0.0064 


1.16 


0.0256 


1.06 


0.0256 


1.08 


0.1024 


1.01 


0.1024 


0.96 


0.4096 


1.01 


0.4096 


1.00 



Table 9: c-cauiious and c-shuffle 



The computational results indicate the fasipass^ cauiious^ and c-cauiious strategies perform com- 
parably and outperform shuffle and c-shuffle. It is only as c becomes large (and hence c-shuffle 
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approaches fastpass) that e-shuffle performs comparably. It is interesting to note how little the 
e-cauiious running times change as e varies. An e-cauitous strategy' seems appealing in a multistage 
problem (especially with discounting) where there is a desire to avoid the computationally expensive 
later stages. In these problems, however, there was not a significant difference between the three and 
four stage problems for the cautious (also e-cauitous) strategies. On the other hand, the shuffle (also 
e-shuffle) strategy’s performance is significantly worse on the four stage problems. As one might 
expect, the “extreme” strategies’ performance deteriorates when the advanced start procedure is 
removed; the average CPU ratios of cautious and shuffle to fastpass in this case are 1.13 and 2.07, 
respectively. 

5 Direct LP Optimizers 

The performance of the enhanced decomposition algorithm (i.e., the base case) and general LP 
optimizers on an enlarged set of test problems is summarized in Table 10. These eight problems are 
based on the Mokelumne and Yuba-Bear-South Feather river basin models with 1, 9, 27, and 45 
scenarios. We use the “x increase” measure with respect to the base case for three other LP solution 
strategies: (i) the primal-dual predictor-corrector interior point algorithm as implemented in IBM’s 
OSL Release 2; (ii) this same interior point algorithm followed by the simplex method to generate an 
extreme point solution; (iii) the primal simplex method as implemented in CPLEX 2.0. The results 
show that on single scenario problems, the decomposition algorithm is outperformed by general LP 
optimizers, but as the number of scenarios grows, the decomposition algorithm is preferable. 

The decomposition algorithm terminates when the relative error is less than 10”"*. The same 
duality gap tolerance was used in both interior point solution strategies; all tests were performed 
on a Hewlett Packard 9000/750 workstation. Due to memory limitations (OSL’s dspace array was 
allocated 64 Mb) we were unable to solve the largest (Ybsf4.45) problem via the interior point 
strategies. Recall (§2) that the stochastic hydro scheduling problems are subproblems in a leirger 
nonlinear Dantzig-Wolfe decomposition algorithm. Thus extreme solutions are desirable, and this is 
why the corresponding time to generate an optimal extreme point solution (via the simplex method) 
from the interior point solution is shown in Table 10. 
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Name 


Base 

Case 


Int. Pt. 


Int. Pt. 
Simplex 


Simplex 




(sec.) 


X increase 


Moke4.1 


16.8 


0.32 


0.46 


0.40 


Moke4.9 


85.0 


1.00 


2.41 


6.31 


Moke4.27 


132.9 


2.26 


6.94 


27.54 


Moke4.45 


221.4 


3.68 


11.37 


47.18 


Ybsf4.1 


48.1 


0.23 


0.42 


0.59 


Ybsf4.9 


177.3 


1.12 


2.94 


16.04 


Ybsf4.27 


252.9 


2.15 


14.84 


61.53 


Ybsf4.45 


404.0 


N/A 


N/A 


100.20 



Table 10: Comparison with Direct LP Optimizers 



6 Summary 

We have described a number of enhancements to the nested Benders decomposition algorithm for 
multistage stochastic linear programming. The enhanced algorithm is a small, but important part 
of an ongoing research and development project at The Pacific Gas and Electric Company; see 
Jacobs et al. [10] for a more detailed description of the project. The performance of the algo- 
rithm was examined on a collection of multistage stochastic hydroelectric scheduling problems. The 
computational results indicated the single most important enhancement is a warm start technique 
which utilizes optimal basis information from previous subproblem solutions. We also described a 
streamlined advanced start procedure that generates preliminary cuts to help guide the early iter- 
ations of the decomposition algorithm; this enhancement provided additional speedup over naive 
implementations. We found the multicut method due to Birge and Louveaux [4] also yielded com- 
putational savings over its single cut counterpart. Consistent with earlier findings of Abrahamson 
[1] and Wittrock [15] (in the deterministic case) and Gassmann [7] (in the stochastic case) we found 
that the fasipass tree traversing strategy performed well. However, a new class of c-cauiious tree 
traversing strategies produced comparable results to fasipass; further investigation of this class of 
strategies may be warranted for problems with many stages. Finally, we showed that taking advan- 
tage of the problem’s special structure through the enhanced decomposition algorithm provides a 
computationally attractive alternative to direct LP optimizers on medium to large-size problems. 
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